-
Notifications
You must be signed in to change notification settings - Fork 1.9k
ci: parallelize the mdbook build <language> process #2772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Down from 25 minutes to 11 minutes. Less than expected, but also less than half the previous run time |
Nice!! |
This now finished (even with missing rust caches!) in 21 minutes. https://github.com/google/comprehensive-rust/actions/runs/15612132409 With caching this would get down quite a lot. @mgeisler can you review the comprehensive-rust-all artifact if that is the correct structure? This would show that uploading the intermediate steps and artifact downloading with merge works as expected. The workflow still needs to be refactored a little bit as there is quite some duplication now but this only serves as a POC as of now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach, but I admit I don't know much about GitHub actions so if there are any subtle gotcha's I would not be the person to detect them.
- name: Update Rust | ||
run: rustup update | ||
|
||
- name: Setup Rust cache | ||
uses: ./.github/workflows/setup-rust-cache | ||
|
||
- name: Install Gettext | ||
run: | | ||
sudo apt update | ||
sudo apt install gettext | ||
- name: Install mdbook | ||
uses: ./.github/workflows/install-mdbook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small question: do we need to install Rust, mdbook
, and the Gettext tools here? I would guess not since everything is built already, no? We're just putting the pieces together and uploading it all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No... but we do ;)
We actually only need i18n-report, that is installed in the install-mdbook step cd922c4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changed over time a bit but essentially cargo xtask install-tools
installs the necessary tool at this moment. For simplification and POC this is just installed as is. This needs to be refactored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know exactly if i18n-report requires gettext.
I could upload the .cargo/bin/i18n-report binary in the previous job then download it in all jobs to use use that. This has the potential to be a bit more obscure in the future. But it probably is already pretty obscure where this is from at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while doing this and already making this a bit more obscure, I could create a tool-building phase that uploads all rust tools so we can upload the entire .cargo/bin/ directory and just use this in every translation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uploads all rust tools so we can upload the entire .cargo/bin/ directory and just use this in every translation
Yes, that would work! Alternatively and perhaps better: we don't need to build things like mdbook
, we can download a binary from https://github.com/rust-lang/mdBook/releases.
There is cargo-binstall
(https://github.com/cargo-bins/cargo-binstall) which I think can help?
The cool way would then be to publish have a little GitHub action so that it becomes very convenient to use mdbook
. We can publish binaries for mdbook-i18n-helpers
as well and install them the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed we need gettext because msgmerge is used in line 69
comprehensive-rust/.github/workflows/publish.yml
Lines 65 to 70 in 2af48cf
- name: Build synced translation report | |
run: | | |
cp -r po synced-po | |
MDBOOK_OUTPUT='{"xgettext": {"pot-file": "messages.pot", "granularity": 0}}' mdbook build -d synced-po | |
for file in synced-po/*.po; do msgmerge --update $file synced-po/messages.pot ; done | |
i18n-report report book/html/synced-translation-report.html synced-po/*.po |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the research!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing this, I would guess we could get a speed-up if i18n-report
was moved to a job running in parallel with all the translations.
The final publish job would then become truly simple: it would gather all the HTML from the translations, plus the HTML from i18n-report
, and publish it all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to resolve this and then we can potentially move i18n-report
later!
I think this is a great approach, well done, @michael-kerscher! Originaly, we only had a simple As for the caching, if building |
f2efc52
to
6499784
Compare
0ff1f44
to
8499f3b
Compare
# See also TRANSLATIONS.md. | ||
|
||
on: | ||
pull_request: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be removed before merging
- main | ||
workflow_dispatch: | ||
|
||
permissions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be re-added before merging
9a54440
to
149216f
Compare
With #2915 merged, I was able to remove the extra build step, creating and downloading artifacts again. The change is now reduced to the split into
the previous sequential build of the language is now using individual jobs to do this in parallel and it looks way better. Current state: For maximum efficiency we still (according to the current builds) need binaries for
and we could shave some minutes of the process by caching the installation that is pretty slow e.g. with https://github.com/marketplace/actions/install-and-cache-apt-tools |
Ah, fun, I had a similar idea today and put #2916 up for review. Looking over the code, I don't see how this can be quicker than downloading things from the (internal) Apt mirrors GitHub maintains. My thinking with regards to caching is that it should be used when the thing being cached takes a significant amount of time to create. Unpacking binaries from the GitHub cache or from an Apt mirror ought to be comparable in time. In #2916, I found that most time was spent on updating the man page database... So telling |
language: | ||
- "en" | ||
- "ar" | ||
- "bn" | ||
- "da" | ||
- "de" | ||
- "el" | ||
- "es" | ||
- "fa" | ||
- "fr" | ||
- "id" | ||
- "it" | ||
- "ja" | ||
- "ko" | ||
- "pl" | ||
- "pt-BR" | ||
- "ro" | ||
- "ru" | ||
- "tr" | ||
- "uk" | ||
- "vi" | ||
- "zh-CN" | ||
- "zh-TW" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine (and easy to sort!), but just in case you hadn't thought of it,
language: ["en", "ar", ...]`
would be equivalent.
Exactly that was my thought, installing the packages did something with the content of the packages and processing the documentation. You approach is even better as there is no use for documentation in the CI environment (at least not yet, AI is not embedded deep enough that it requires that documentation :D) |
This reverts commit de81b2f.
This reverts commit 9ad2542.
…steps. This should be a massive performance gain in the publish step as we don't build several rust tools from scratch anymore for each language
… build/install logs
…ng that does not preserve permissions)
This needs to be refactored later
The install-mdbook step now uses --binstall where available. Once we build more binaries for our dependencies, this becomes faster without further work
149216f
to
49da39b
Compare
run: | | ||
sudo apt update | ||
sudo apt install gettext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should now use the new apt-get-install
action 🎉
|
||
- name: Upload artifact | ||
uses: actions/upload-pages-artifact@v4 | ||
# TODO: remove this before merging into main. This is just for verification of the build result while developing this workflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment here so we don't forget.
uses: actions/deploy-pages@v4 | ||
name: comprehensive-rust-all | ||
path: book/ | ||
# TODO: uncomment the following lines before merging into main. This is removed for development purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another comment to ensure we don't forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me and I'm happy for it to merge when you feel confident with it too.
Make the mdbook build process parallel. This is a simple approach by just cloning the entire repository into new directories and running the process in parallel. This is a demonstration for #2767.
The process appears to be mostly CPU bound and not memory or disk heavy so the amount of cores on the CI runner matter.
mdbook build does not seem to use more than 1 core and a typical github CI machine has 4 cores. So the expected speedup would be 4x.
Currently 30 minutes are used for publish, 25 minutes of this is build process, this would be reduced to 5 + 6.25 = ~12 minutes, thus more than halving the process.